Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | の | 26 | や |
2 | 、 | 27 | 人 |
3 | 。 | 28 | 年 |
4 | を | 29 | など |
5 | に | 30 | だ |
6 | は | 31 | です |
7 | た | 32 | さん |
8 | が | 33 | ある |
9 | で | 34 | か |
10 | て | 35 | 者 |
11 | と | 36 | 市 |
12 | し | 37 | なっ |
13 | も | 38 | へ |
14 | ・ | 39 | 【 |
15 | する | 40 | 月 |
16 | な | 41 | 日本 |
17 | から | 42 | 】 |
18 | 日 | 43 | よう |
19 | いる | 44 | まで |
20 | ない | 45 | 的 |
21 | い | 46 | 県 |
22 | れ | 47 | まし |
23 | さ | 48 | 中 |
24 | こと | 49 | この |
25 | ます | 50 | という |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges